AITopics | sketch understanding

Collaborating Authors

sketch understanding

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inference Suppose S: X R is a continuous set function w.r.t Hausdorff distance dH(,). ε > 0, foranyfunctionf andanyinvertiblemapP: X Rn, functionhandg,suchthatfor anyX X: |S(X) g(P

Neural Information Processing SystemsFeb-7-2026, 13:06:26 GMT

Theorem 2. The Instances in the bag are represented by random variables Θ1,Θ2,...,Θn, the information entropy of the bag under the correlation assumption can be expressed as H(Θ1,Θ2,...,Θn), and the information entropy of the bag under the i.i.d. Therefore, it is proved that the information source under the correlation assumption has smaller information entropy. In other words, correlation assumption reduces the uncertainty and brings more useful information. Given a set of bags {X1,X2,...,Xb}, and each bag Xi contains multiple instances {xi,1,xi,2,...,xi,n} and a corresponding label Yi. Obviously, the key to Transformer based MIL is how to design the mapping of X T. However, there are many difficulties to directly apply Transformer in WSI classification, including the large number of instances in each bag and the large variation in the number of instances in different bags (e.g., ranging from hundreds to thousands).

artificial intelligence, image understanding, sketch understanding, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.42)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.42)

Add feedback

ProHD: Projection-Based Hausdorff Distance Approximation

Fu, Jiuzhou, Guo, Luanzheng, Tallent, Nathan R., Zhao, Dongfang

arXiv.org Artificial IntelligenceNov-25-2025

The Hausdorff distance (HD) is a robust measure of set dissimilarity, but computing it exactly on large, high-dimensional datasets is prohibitively expensive. We propose \textbf{ProHD}, a projection-guided approximation algorithm that dramatically accelerates HD computation while maintaining high accuracy. ProHD identifies a small subset of candidate "extreme" points by projecting the data onto a few informative directions (such as the centroid axis and top principal components) and computing the HD on this subset. This approach guarantees an underestimate of the true HD with a bounded additive error and typically achieves results within a few percent of the exact value. In extensive experiments on image, physics, and synthetic datasets (up to two million points in $D=256$), ProHD runs 10--100$\times$ faster than exact algorithms while attaining 5--20$\times$ lower error than random sampling-based approximations. Our method enables practical HD calculations in scenarios like large vector databases and streaming data, where quick and reliable set distance estimation is needed.

artificial intelligence, image understanding, prohd, (16 more...)

arXiv.org Artificial Intelligence

2511.18207

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.68)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.65)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.65)

Add feedback

RadarSFD: Single-Frame Diffusion with Pretrained Priors for Radar Point Clouds

Zhao, Bin, Garg, Nakul

arXiv.org Artificial IntelligenceSep-23-2025

Millimeter-wave radar provides perception robust to fog, smoke, dust, and low light, making it attractive for size, weight, and power constrained robotic platforms. Current radar imaging methods, however, rely on synthetic aperture or multi-frame aggregation to improve resolution, which is impractical for small aerial, inspection, or wearable systems. We present RadarSFD, a conditional latent diffusion framework that reconstructs dense LiDAR-like point clouds from a single radar frame without motion or SAR. Our approach transfers geometric priors from a pretrained monocular depth estimator into the diffusion backbone, anchors them to radar inputs via channel-wise latent concatenation, and regularizes outputs with a dual-space objective combining latent and pixel-space losses. On the RadarHD benchmark, RadarSFD achieves 35 cm Chamfer Distance and 28 cm Modified Hausdorff Distance, improving over the single-frame RadarHD baseline (56 cm, 45 cm) and remaining competitive with multi-frame methods using 5-41 frames. Qualitative results show recovery of fine walls and narrow gaps, and experiments across new environments confirm strong generalization. Ablation studies highlight the importance of pretrained initialization, radar BEV conditioning, and the dual-space loss. Together, these results establish the first practical single-frame, no-SAR mmWave radar pipeline for dense point cloud perception in compact robotic systems.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.18068

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.69)
Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.55)

Add feedback

Robust 2D lidar-based SLAM in arboreal environments without IMU/GNSS

Nazate-Burgos, Paola, Torres-Torriti, Miguel, Aguilera-Marinovic, Sergio, Arévalo, Tito, Huang, Shoudong, Cheein, Fernando Auat

arXiv.org Artificial IntelligenceMay-19-2025

Simultaneous localization and mapping (SLAM) approaches for mobile robots remains challenging in forest or arboreal fruit farming environments, where tree canopies obstruct Global Navigation Satellite Systems (GNSS) signals. Unlike indoor settings, these agricultural environments possess additional challenges due to outdoor variables such as foliage motion and illumination variability. This paper proposes a solution based on 2D lidar measurements, which requires less processing and storage, and is more cost-effective, than approaches that employ 3D lidars. Utilizing the modified Hausdorff distance (MHD) metric, the method can solve the scan matching robustly and with high accuracy without needing sophisticated feature extraction. The method's robustness was validated using public datasets and considering various metrics, facilitating meaningful comparisons for future research. Comparative evaluations against state-of-the-art algorithms, particularly A-LOAM, show that the proposed approach achieves lower positional and angular errors while maintaining higher accuracy and resilience in GNSS-denied settings. This work contributes to the advancement of precision agriculture by enabling reliable and autonomous navigation in challenging outdoor environments.

a-loam, artificial intelligence, image understanding, (11 more...)

arXiv.org Artificial Intelligence

2505.10847

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.57)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.57)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.49)

Add feedback

Hashigo: A Next Generation Sketch Interactive System for Japanese Kanji

Taele, Paul, Hammond, Tracy

arXiv.org Artificial IntelligenceApr-22-2025

Language students can increase their effectiveness in learning written Japanese by mastering the visual structure and written technique of Japanese kanji. Yet, existing kanji handwriting recognition systems do not assess the written technique sufficiently enough to discourage students from developing bad learning habits. In this paper, we describe our work on Hashigo, a kanji sketch interactive system which achieves human instructor - level critique and feedback on both the visual structure and written technique of students' sketched kanji. This type of automated critique and feedback allows students to target and correct specific deficiencies in their sketches that, if left untreated, are detrimental to effective long - term kanji learning.

artificial intelligence, handwriting recognition, student, (15 more...)

arXiv.org Artificial Intelligence

2504.1394

Country:

Asia > Japan > Honshū (0.28)
North America > United States > Texas > Brazos County > College Station (0.14)

Genre: Research Report (0.40)

Industry:

Education > Curriculum > Subject-Specific Education (0.94)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)

Add feedback

Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding

Forbus, Kenneth D., Chen, Kezhen, Xu, Wangcheng, Usher, Madeline

arXiv.org Artificial IntelligenceJul-5-2024

One of the purposes of perception is to bridge between sensors and conceptual understanding. Marr's Primal Sketch combined initial edge-finding with multiple downstream processes to capture aspects of visual perception such as grouping and stereopsis. Given the progress made in multiple areas of AI since then, we have developed a new framework inspired by Marr's work, the Hybrid Primal Sketch, which combines computer vision components into an ensemble to produce sketch-like entities which are then further processed by CogSketch, our model of high-level human vision, to produce both more detailed shape representations and scene representations which can be used for data-efficient learning via analogical generalization. This paper describes our theoretical framework, summarizes several previous experiments, and outlines a new experiment in progress on diagram understanding.

artificial intelligence, qualitative reasoning, representation, (17 more...)

arXiv.org Artificial Intelligence

2407.04859

Country: North America > United States > Illinois (0.28)

Genre: Research Report (0.64)

Industry:

Education (0.46)
Health & Medicine (0.46)
Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Qualitative Reasoning (0.93)

Add feedback

Sampling and Ranking for Digital Ink Generation on a tight computational budget

Afonin, Andrei, Maksai, Andrii, Timofeev, Aleksandr, Musat, Claudiu

arXiv.org Artificial IntelligenceJun-2-2023

Digital ink (online handwriting) generation has a number of potential applications for creating user-visible content, such as handwriting autocompletion, spelling correction, and beautification. Writing is personal and usually the processing is done on-device. Ink generative models thus need to produce high quality content quickly, in a resource constrained environment. In this work, we study ways to maximize the quality of the output of a trained digital ink generative model, while staying within an inference time budget. We use and compare the effect of multiple sampling and ranking techniques, in the first ablation study of its kind in the digital ink domain. We confirm our findings on multiple datasets - writing in English and Vietnamese, as well as mathematical formulas - using two model types and two common ink data representations. In all combinations, we report a meaningful improvement in the recognizability of the synthetic inks, in some cases more than halving the character error rate metric, and describe a way to select the optimal combination of sampling and ranking techniques for any given computational budget.

machine learning, natural language, ranking model, (18 more...)

arXiv.org Artificial Intelligence

2306.03103

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization

Wang, Xinyu, Gui, Lin, He, Yulan

arXiv.org Artificial IntelligenceMay-30-2023

Document-level multi-event extraction aims to extract the structural information from a given document automatically. Most recent approaches usually involve two steps: (1) modeling entity interactions; (2) decoding entity interactions into events. However, such approaches ignore a global view of inter-dependency of multiple events. Moreover, an event is decoded by iteratively merging its related entities as arguments, which might suffer from error propagation and is computationally inefficient. In this paper, we propose an alternative approach for document-level multi-event extraction with event proxy nodes and Hausdorff distance minimization. The event proxy nodes, representing pseudo-events, are able to build connections with other event proxy nodes, essentially capturing global information. The Hausdorff distance makes it possible to compare the similarity between the set of predicted events and the set of ground-truth events. By directly minimizing Hausdorff distance, the model is trained towards the global optimum directly, which improves performance and reduces training time. Experimental results show that our model outperforms previous state-of-the-art method in F1-score on two datasets with only a fraction of training time.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.18926

Country:

Asia > China (0.46)
Oceania > Australia (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe (0.14)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

Google adds Digital Ink Recognition API for touch and stylus input to ML Kit

#artificialintelligenceAug-6-2020, 18:25:19 GMT

A month after announcing changes to ML Kit, its toolset for developers to infuse apps with AI, Google today launched the Digital Ink Recognition API on Android and iOS to allow developers to create apps where stylus and touch act as inputs. As the name implies, the API -- which is powered by the same technology underpinning Google's Gboard software keyboard, Quick Draw, and AutoDraw -- looks at a user's strokes on the screen and recognizes what they're writing or drawing. Google says that with the new Digital Ink Recognition API, developers can enable users to input text and figures with a finger and stylus or transcribe handwritten notes to make them searchable. Classifiers parse written text into a string of characters; other classifiers describe shapes such as drawings, sketches, and emojis by the class to which they belong (e.g., circle, square, happy face, and so on). The Digital Ink Recognition API performs processing in near-real-time and on-device, according to Google, with support for over 300 languages and more than 25 writing systems including all major Latin languages, Chinese, Japanese, Korean, Arabic, and Cyrillic.

artificial intelligence, ml kit, sketch understanding, (4 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)

Add feedback

Interactive Cognitive Assessment Tools: A Case Study on Digital Pens for the Clinical Assessment of Dementia

Sonntag, Daniel

arXiv.org Artificial IntelligenceOct-11-2018

Interactive cognitive assessment tools may be valuable for doctors and therapists to reduce costs and improve quality in healthcare systems. Use cases and scenarios include the assessment of dementia. In this paper, we present our approach to the semi-automatic assessment of dementia. We describe a case study with digital pens for the patients including background, problem description and possible solutions. We conclude with lessons learned when implementing digital tests, and a generalisation for use outside the cognitive impairments field.

artificial intelligence, assessment, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1810.04943

Country: Europe > Germany (0.46)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Dementia (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
(3 more...)

Add feedback